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1 EP 0 797 

Description 

[0001] Tlie present invention relates to a metliod and 
apparatus for audio communication over a data net- 
worl^. 5 
[0002] Conventionally voice signals have been trans- 
mitted over standard telephone lines. However, with the 
increase in locations provided with local area networks 
(LANs) and the growing Importance of multimedia com- 
munications, there has been considerable Interest In the io 
use of LANs to carry voice signals. This work Is de- 
scribed for example In "Using Local Area Networks for 
Carrying Online Voice" by D Cohen, pages 13-21 and 
"Voice Transmission over an Ethernet Backbone" by P 
Ravaslo. R Marcogllese, and R Novarese, pages 39-65, is 
both In "Local Computer Networks" (edited by P Rava- 
sio. G Hopkins, and N Naffah; North Holland, 1 982). The 
basic principles of such a scheme are that a first terminal 
or workstation digitally samples a voice input signal at 
a regular rate (eg 8 Khz). A number of samples are then 20 
assembled into a data packet for transmission over the 
network to a second terminal, which then feeds the sam- 
ples to a loudspeaker or equivalent device for playout. 
again at a constant 8 Khz rate. 

[0003] Conventional older audio communication sys- 25 
terns comprise a central mixing hub and audio confer- 
encing terminals connected thereto in a star network. 
The central hub receives audio signals from each termi- 
nal and produces a composite signal therefrom. The 
composite signal Is then transmitted back to each ter- so 
minal less that terminal's own audio signal. This is to be 
contrasted against LAN audio conferencing systems 
which have a more distributed architecture. Each termi- 
nal must receive via a data network the audio signals of 
all of the other terminals connected to the data network. 3S 
Receiving the audio signals In parallel requires a large 
amount of bandwidth. 

[0004] The bandwidth requirement of an audio com- 
munication system using a scheme as described above 
varies according to the number of users of the system. 40 
For example, in an audio communication system which 
encodes audio as 8 bit pulse code modulation sampled 
at 8 Khz: a two way audio conference requires a band- 
width of 2 X 64 kbps; a five way audio conference using 
individual addressing requires a bandwidth of 20 x 64 45 
Kbps; and a five way audio conference using group ad- 
dressing requires a bandwidth of 5 x 64 Kbps. 
[0005] Consequently, it can be seen that the greater 
the number of parties to a conference, the greater the 
bandwidth required to implement the audio communica- so 
tlon system. Allowing an audio communication system 
to utilise the available bandwidth of a LAN without re- 
straint will have an adverse effect on the overall perform- 
ance of the LAN. 

[0006] In a typical two-party conversation, each party 55 
speaks for approximately less than forty percent of the 
total time for which the parties are connected (see "The 
Voice Activity Detector for Pan-European Digital Cellu- 
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lar Mobile Telephone Service", by Freeman et al, IEEE 
1989). The audio communication apparatus used by 
each party conventionally picks up acoustic waves in the 
vicinity of a microphone associated therewith. The 
acoustic waves include the voice of a party to the con- 
versation and any background office noise. A electrical 
signal representing acoustic waves is produced by the 
microphone. The signal is digitised to produce digital au- 
dio samples of the output of the microphone. The sam- 
ples are then placed in, for example, packets and trans- 
mitted over the local area network to a receiving appa- 
ratus for output to the other party to the communication. 
As only forty percent of the samples produced by one 
of the parties contain voice data, it follows that only forty 
percent of the traffic attributable to the two-way conver- 
sation comprises voice data, the remaining packets pro- 
duced and transmitted contain silence or, in an office 
environment, very low level background noise. GB2 1 72 
475 A discloses a packet switching system in which 
speech is packetised and voice activity detectors are 
used to monitor speech in the Go and Return paths. In 
the Go path, the voice activity detector compares the 
current level of packets with (a) the current back-ground 
noise value, and (b) the computed value of the expected 
echo due to speech packets in the Return path. If the 
Go path packet is larger than the parameter of (a) and 
(b) by a preset arrangement the packet is sent, other- 
wise it is not. If the "send" decision persists for a number 
of speech packets, that send condition has a hangover 
period attached to it. If the parameters are properly cho- 
sen, then the speech heard by a subscriber is not duly 
affected. 

[0007] A further problem with using a LAN to carry 
voice data Is that the transmission time across the net- 
work Is variable. Thus the arrival of packets at a desti- 
nation node is both delayed and irregular. If the packets 
were played out in irregular fashion, this would have an 
extremely adverse effect on Intelligibility of the voice sig- 
nal. 

[0008] Accordingly, the present invention provides a 
method for audio communication comprising receiving 
a plurality of digital audio samples In a memory and 
transmitting said digital audio samples over a data net- 
work when a predetermined number thereof have been 
accumulated, said method further comprising the steps 
of detecting that a current sample does not represent 
voice activity, transmitting all samples stored In said 
memory over said data network Irrespective of whether 
or not there are said predetermined number thereof, and 
suspending transmission of said samples; detecting that 
a current sample represents a resumption of voice ac- 
tivity, resuming said transmitting of said samples, and 
transmitting an indication of the duration of said suspen- 
sion. 

[0009] As a consequence of suspending further trans- 
mission of samples over the data network, redundant 
packets containing, In effect, silence or background of- 
fice noise, are obviated and the traffic load of the data 
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network is significantly reduced. Accordingly, a very ef- 
ficient way of reducing the traffic load over a data net- 
work is provided. Reducing the traffic load of a data net- 
work improves the performance and efficiency of use 
thereof. Both transmitting and receiving apparatuses 
are free to process other data rather than concern them- 
selves with processing voice samples which constitute 
silence or mere background noise. 
[0010] Although the efficiency of utilisation of a data 
network can be improved as a consequence of using 
data compression algorithms, such algorithms are not 
appropriate to compress silence. The compression of 
silence introduces noise to the audio communication 
and. more significantly, still contributes to the traffic load 
of the network. Accordingly, the present invention by 
suspending further transmission of audio data upon de- 
tecting silence advantageously reduces data network 
traffic as compared to audio communication apparatus- 
es which use audio compression techniques. 
[0011] Such an indication or time stamp advanta- 
geously allows a receiving apparatus terminal to cor- 
rectly time the output of the received samples after a 
period of silence or non -transmission from the transmit- 
ting apparatus thereby mitigating the effect of delays in- 
troduced as a consequence of data network latency 
[0012] A suitable time stamp or indication could be a 
sample count. Each count represents a single period 
having a duration determined by the sampling rate. 
[0013] An extended period of inactivity over a data 
network typically results in the network timing-out and 
dropping the connection between the apparatuses. 
[0014] Accordingly, the present invention provides a 
method comprising periodically transmitting over said 
data network during said suspension a message in or- 
der to prevent said data network from timing-out. 
[001 5] The time-out data can be used to maintain the 
correct timing of the output of data at a receiving termi- 
nal. Therefore, the present invention further provides a 
method wherein said time-out data comprises said a 
sample count. 

[0016] If a count of samples representing non-voice 
activity were transmitted to a receiving terminal every 
time DSP had counted sufficient samples which, had 
they been stored, could fill the memory, the extent of 
traffic load reduction achieved would be mitigated. 
[0017] Suitably, an aspect of the present invention 
provides a method wherein said periodic transmission 
is effected a predetermined integer multiple of said sam- 
ple count thereby minimising the load on the data net- 
work. 

[001 8] Having received the voice samples and timing 
information it is desirable to produce an audio output 
therefrom. 

[0019] Accordingly, the present invention provides a 
method for audio communication comprising receiving 
a number of digital audio samples over a data network 
and producing an audio output therefrom, said method 
further comprising the steps of receiving further audio 



samples after a suspension in transmission of said au- 
dio samples, receiving an indication of the duration of 
the suspension in transmission of said audio samples, 
and resuming said production of said audio output ac- 
5 cording to said indication. 

[0020] The present invention also provides an appa- 
ratus for audio communication comprising means for re- 
ceiving a plurality of digital audio samples in a memory 
and means for transmitting said digital audio samples 
JO over a data network when a predetermined number 
thereof have been accumulated, said system further 
comprising means for detecting that a current sample 
does not represent voice activity, means for transmitting 
all samples stored in said memory over said data net- 
's work irrespective of whether or not there are said pre- 
determined number thereof, and means for suspending 
transmission of said samples; means for detecting that 
a current sample represents a resumption of voice ac- 
tivity, means for resuming said transmitting of said sam- 
20 pies, and means for transmitting an indication of the du- 
ration of said suspension. 

[0021] An embodiment of the present invention will 
now be described, by way of example only, with refer- 
ence to the accompanying drawings in which: 

25 

Figure 1 is a schematic diagram of an audio com- 
munication system. 

Figure 2 is a block diagram of apparatus for audio 
30 commination according to an embodiment. 

Figure 3 is a simplified diagram showing the com- 
ponents of an audio adapter card, 

35 Figure 4 schematically illustrates the arrangement 
of samples and other data within memory, 

Figure 5 is a flow diagram representing the opera- 
tion of the apparatus for audio communication of fig- 
40 ure 2. 

Figure 6 schematically shows the format of data 
stored in memory, 

45 Figure 7 schematically illustrates the data flow of 
audio samples through an audio communication 
system. 

[0022] Referring to figure 1 , there is shown an audio 
50 communication system 100 in which audio communica- 
tion apparatuses 105 exchange voice data over a data 
network 110. 

[0023] Figure 2 is a schematic illustration of an audio 
communication apparatus 105. The apparatus 105 
55 comprises a microprocessor 205, semiconductor mem- 
ory (ROM/RAM) 210. and a bus 215 over which data is 
transferred. The apparatus 105 may be implemented 
using any conventional workstation, such as an IBM PS/ 
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2 computer. 

[0024] The apparatus 105 is also equipped with two 
adapter cards. The first of these is a Token Ring adapter 
card 220. This card 220, together with accompanying 
software, allows messages to be transmitted to and re- 
ceived from a Token Ring operating over a local area or 
data network 110. The operation of the token ring card 
is well-known, and so will not be described in detail. 
[0025] The second card is an audio adapter card 1 25. 
The audio adapter card is connectable to a microphone 
and a loudspeaker (not shown in figure 2) tor audio Input 
and output respectively. The apparatuses are typically 
used for N-way voice communications over a LAN, 
where N is at least two, but may also be used in other 
multimedia applications, where one node in the network, 
for example, generates a sound signal {eg from an op- 
tical disk), which is transmitted over the network to be 
played out to a user at another node. 
[0026] The audio adapter card 225 is illustrated in 
more detail in Figure 3. The audio adapter card 225 
schematically illustrated is an M-Wave card available 
from IBM. The card contains an A/D converter 300 to 
digitise incoming audio signals derived from an attached 
microphone 305. The A/D converter 300 is attached to 
a codec 310, which samples the incoming audio signal 
at a rate of 44.1 Khz into 16 bit samples (corresponding 
to the standard sampling rate/size for compact disks). 
Digitised samples are then passed to a digital signal 
processor (DSP) 31 5 on the card via a double buffer 320 
(ie the codec 310 loads a sample into one half of the 
double buffer while the codec 310 reads the previous 
sample from the other half). The DSP 315 is controlled 
by one or more programs stored in semiconductor mem- 
ory 325 on the card. The DSP places the samples into 
memory 325 for subsequent transmission thereof to the 
intended addressee. Referring to figure 4, the memory 
comprises three fields. The first field 400 contains an 
indication of the number of samples within the memory 
325 i.e. a length count. The length count is increased by 
one for each sample stored in the current block of sam- 
ples. After transmission of each block of voice samples 
the length count is reset to zero. The second field 405 
contains a sample count representing the total number 
of samples encountered i.e. both voice and noise sam- 
ples. The second field 405 typically comprises two bytes 
and accordingly will overflow to zero after 65536 sam- 
ples have been counted. The third field 41 0 contains the 
voice samples of the current block stored in memory 
325. When the samples are ready for transmission the 
DSP 315 ensures that the contents of memory 325 are 
transmitted via the token ring card 220 over the network 
1 1 0 to a receiving terminal 1 05. That is, the samples 410 
together with the length 400 and sample 405 counts are 
transmitted to the receiving terminal 105. 
[0027] The length of the memory exceeds that illus- 
trated in figure 4. The length field 415 represents the 
beginning of the next set of samples to be transmitted. 
Therefore, a further field 41 5 can optionally be transmit- 



ted over the network 110 to delimit the current samples 
410 from any succeeding samples. 
[0028] Although the embodiment described compris- 
es a memory 325 having three fields, an embodiment 

s can equally well be realised in which the two counts are 
stored in separate registers which are, for example, in- 
ternal to the DSP. The counts 400 and 405 can then be 
added to the samples 410 to be transmitted during the 
transfer of the samples 410 to the token ring card 220. 

10 [0029] The samples 410 may be output for transmis- 
sion upon the occurrence of any of the following events. 
The number of samples in memory 325 reaches a pre- 
determined maximum number (such a limit is necessary 
to prevent the speech samples from being unduly de- 

15 layed and thereby affecting the intelligibility of speech), 
silence or suspension of voice activity is detected, or, 
during prolonged periods of silence, a maximum prede- 
termined period of silence or period of suspension is 
reached. 

20 [0030] The communication links between the DSP 
315 and the microprocessor 105 of both a transmitting 
and receiving apparatuses, the respective token ring 
cards of a transmitting 105 and receiving 105 appara- 
tuses all comprises asynchronous links. Accordingly, in 

25 the embodiment described, the capture of samples at 
the transmitting terminal and output of the samples at 
the receiving terminal are asynchronous processes. 
[0031] In the particular embodiment shown, the DSP 
is programmed to transform samples from the codec 

30 from 16 bits at 44.1 Khz into a new digital signal having 
an 8 Khz sampling rate, with 8-bit samples on a ji-law 
scale (essentially logarithmic), ie corresponding to 
CCITT standard G.711, using standard re-sampling 
techniques. The total bandwidth of the signal is there- 

35 fore 64 Khz. The DSP also performs the opposite con- 
version on an incoming signal received, ie it converts 
the signal from 8-bit 8kHz to 1 6-bit, 44. 1 Khz, again us- 
ing known re-sampling techniques. 
[0032] Note that this conversion between the two 

40 sampling formats is not an essential part of the inven- 
tion. Some other audio cards may intrinsically support 
the 8 Khz format (ie the CODEC can operate according 
to G.711 format), or, alternatively, the 44.1 Khz samples 
could be used throughout. The latter option demands a 

45 much higher bandwidth and greatly increased process- 
ing speed; these might be acceptable if the audio signal 
being transmitted over the network needs to be of CD 
quality, but for normal voice communications the 64 Khz 
bandwidth signal of the G.711 format is perfectly ade- 

50 quate. 

[0033] The DSP aggregates 64 samples in the G711 
format together into blocks of 64 bytes, corresponding 
to 8 ms of data. This represents the basic unit of 
processing and transmission over the network. Thus 
55 every 8 ms, a new block of samples is available for trans- 
mission over the data network. Finally, the DSP raises 
an interrupt in a thread running on the main processor 
205, again in accordance with known interrupt process- 
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ing techniques, informing it that another block ot sam- 
ples are ready for transmission over the network. The 
DSP cycles through the loop shown in Figure 5 every 8 
ms. 

[0034] The main processor 205 requests access to 
the data network 110 via the Token Ring card 220. This 
is performed using the NCB_SEND command in the 
NETBIOS interface (data transmission over a LAN is 
well-known, and so will not be described in detail; see 
IBM Local Area Network Technical Reference. 
SC30-3383-03 for more information). The Token Ring 
card incorporates the samples into a suitable data pack- 
et and transmits it over the network. 
[0035] The DSP also monitors the samples to deter- 
mine whether or not they represent voice activity at the 
microphone 305. Many algorithms are available which 
can detect voice activity. Such algorithms are disclosed 
in, for example. US 4 975 657. US 4 700 392 and US 4 
028 496. Accordingly, detailed description of such an al- 
gorithm wilt not be given. An appropriate algorithm used 
to determine whether or not voice activity is present is 
that used in the cellular communication recommenda- 
tion GSM standard. Alternatively, a simple threshold val- 
ue can be used to determine whether or not samples 
represent speech i.e. if the sample value is above the 
threshold voice activity is present and visa versa. If the 
current sample does not represent voice activity, an in- 
terrupt to the microprocessor 205 is raised indicating 
that the voice samples stored thus far should be imme- 
diately transmitted over the data network 110 to the in- 
tended addressee. Such an immediate transfer of the 
samples results in a variable length block of voice sam- 
ples. 

[0036] The DSP 215, upon determining that voice ac- 
tivity is not present, no longer stores samples, which 
now represent either silence or very low level back- 
ground office noise. Since the samples do not represent 
voice activity the DSP ceases to interrupt the microproc- 
essor and transmission of data over the data network 
110 is suspended. Accordingly, the traffic over the data 
network 1 1 0 is reduced. As only approximately forty per- 
cent of the data emanating from a conventional terminal 
represents voice activity, the present invention achieves 
approximately a sixty-percent reduction in traffic asso- 
ciated with an audio communication. 
[0037] During the period of suspension, the DSP con- 
tinues to monitor the samples output by the codec. The 
storage and subsequent transmission of samples over 
the data network is resumed when voice activity is de- 
tected. 

[0038] Although, transmission over the data network 
is suspended the DSP 215 continues to maintain the 
sample count. In an embodiment, an overflow count is 
maintained of the number of times the samples count 
as overflowed and reset to zero. When the overflow 
count reaches a predetermined value, an interrupt is 
raised to the microprocessor 205 requesting transmis- 
sion of the sample count over the data network. Such a 



transmission is effected in the usual manner. Periodical- 
ly transmitting the sample count as described has a two 
fold advantage. Firstly, the transmission prevents the 
data network from timing out and dropping the connec- 

s tion between an addressor and an addressee. Second- 
ly, the receiving apparatus can use the sample count to 
maintain the correct timing of the output of any subse- 
quently received samples representing voice activity 
when transmission resumes. Transmission continues 

10 after resumption as described above. 

[0039] Referring to figure 5, there Is shown a flow di- 
agram illustrating the operation of an embodiment. The 
sample and length counts are set to zero at step 500. 
At step 505 the signal output from the microphone is 

IS sampled and the sample count is incremented by one. 
A determination is made as to whether or not a micro- 
phone is picking up voice activity at step 510. If the mi- 
crophone is picking up voice activity, the sample is add- 
ed to the memory 325 and the length count is increment- 

20 ed by one at step 51 5. Afterthe sample is added to mem- 
ory 225, a check is made, at step 520. as to whether or 
not the samples should be output for transmission over 
the data network by comparing the length count against 
a predetermtnable threshold. The threshold value is de- 

25 pendent upon the maximum size of the buffer or an ac- 
ceptable transmission delay which will not affect the in- 
telligibility of the speech as discussed above. If there 
are sufficient samples, all samples stored in the memory 
325 are output to the token ring card for transmission 

30 over the data network at step 525. At step 530 the length 
count is reset to zero then sampling of the microphone 
signal is resumed at step 505. If the microphone is not 
picking up voice activity, a determination is made at step 
535 as to whether or not the memory 325 contains any 

35 samples to be output for transmission by the token ring 
over the data network. If the length count is equal to zero 
the memory 325 does not contain any such samples and 
sampling of the microphone signal is resumed. It the 
length count is not equal to zero, the memory 325 con- 

40 tains samples which should be immediately output for 
transmission over the network. Any such samples are 
so output for transmission at step 525. The length count 
is then reset to zero at step 530 and sampling of the 
microphone is resumed at step 505. 

45 [0040] Although the above uses a count to indicate 
whether or not the memory contains samples for trans- 
mission, a simple flag can equally well be used instead. 
[0041 ] Audio signals to be played out are received by 
the DSP 31 5 from the terminal bus 31 5, and processed 

so in a converse fashion to incoming audio. That is, the out- 
put audio signals are passed through the DSP 315 and 
a double buffer 330 to the codec 310. from there to a D/ 
A converter 335, and finally to a loudspeaker 340 or oth- 
er appropriate output device. 

55 [0042] Although the above embodiment uses an in- 
terrupt technique to inform the main processor when the 
next set of samples are available for transmission, a 
polling technique can equally well be used. The use of 
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a polling technique is preferably wlien working the 
present invention within a Windows^" environment. The 
microprocessor 105 of the terminal 100 periodically in- 
terrogates the audio communication apparatus to deter- 
mine whether or not the current block of voice samples 
are ready for transmission. The count, stored in memory 
225, is used to indicate whether or not the samples are 
ready for transmission. The DSP maintains an internal 
count within an internal register and the count stored in 
memory 225 is maintained at zero until the samples are 
ready for transmission. Accordingly, upon interrogation 
by the microprocessor 1 05, a count of zero indicates that 
the current block of samples is not ready for transmis- 
sion where as a non-zero count indicates that the cur- 
rent block of samples Is ready for transmission. As 
above, the samples are ready for transmission if either 
the maximum number of samples have been stored or 
suspension of voice activity is detected. Upon the oc- 
currence of either event the DSP copies the internal 
count into memory 225. Accordingly, when the micro- 
processor 105 next interrogates the external count in 
memory 225 other than zero will be seen indicating that 
the current block of samples are ready to be transmitted 
via the token ring over the data network to the intended 
addressee. 

[0043] The receiving apparatus upon receiving an au- 
dio packet over the data network performs the Inverse 
of the above. The microprocessor 105 Is Informed by 
the token ring card of the arrival of a message or packet 
containing samples. The samples together with the 
sample count are transferred by the microprocessor into 
memory 325 contiguous with any preceding samples 
which have yet to be output as shown in figure 6. In order 
to distinguish the most recently received samples 625 
and associated counts 615 and 620 from the previously 
received samples 610 and associated counts 600 and 
605. the length count 615 of the former is temporarily 
withheld (set to zero) by the microprocessor 205. Ac- 
cordingly, the data loaded into memory by the micro- 
processor Is as shown in figure 6. The DSP 315 Is ar- 
ranged to cease output of samples when the zero length 
count Is encountered, such an arrangement prevents 
the M-wave card from outputting samples from the next 
block of samples. Alternatively, the DSP count keep 
track of the number of samples output thus far and by 
comparing this number against the length count deter- 
mine when to cease output of samples. 
[0044] The samples received are output at a rate de- 
termined by a local clock at the receiving apparatus. Al- 
though, the frequency of the local clock may closely 
match the frequency of the local clock In the transmitting 
apparatus, in practice the two clocks do not operate at 
Identical frequencies. Accordingly over time the corre- 
sponding periods of said clocks will occur at different 
absolute times. It Is desirable to maintain the relative 
timing of speech output. 

[0045] The DSP 315 of receiving terminal also main- 
tains a sample count. The sample count, as samples are 



not received during periods of suspended transmission. 
Is Incremented according to the local clock at the receiv- 
ing terminal. Due to the above clock drift, the sample 
count at the receiving terminal will not necessarily match 
5 the samples count at the transmitting terminal Assum- 
ing that the local clock at the receiving terrhlnal is slightly 
slower than the local clock at the receiving terminal, the 
sample count at the receiving terminal will be less than 
that received. 

10 [0046] In the embodiment described, the microproc- 
essor cannot Interrupt the DSP to indicate that voice 
samples are ready to be output. Accordingly the DSP 
31 5, when awaiting samples for output, continually polls 
the length field in memory to determine whether or not 

15 it Is zero. When the length field is non-zero, the DSP 
compares It's sample count with the sample count in the 
second field of memory and outputs nothing or comfort 
noise until the two counts match. When the two counts 
match the samples following the length count are output 

20 to the Intended addressee thereby ensuring that the rel- 
ative periods of silence are of the correct length. Further, 
re-synchronisation is established after the period of si- 
lence. 

[0047] A further embodiment provides for latency 

25 compensation at the receiving apparatus in order to ab- 
sorb transient delays Introduced by the data network. 
The microprocessor 105 of the receiving apparatus up- 
on receiving any samples and associated counts, adds 
to the sample count representing an actual latency, L. 

30 This has the effect of delaying the output of the samples 
by a time given by L multiplied by the local clock period. 
The value of L is Incrementally Increased if underruns 
occur and gradually decreased while underruns do not 
occur. Accordingly, the effect of any latency Introduced 

35 by the data network Is mitigated. 

[0048] A further embodiment can be realised which 
takes advantage of the adaptive latency compensation 
in which L Is varied only by very small amounts. The 
receiving apparatus establishes a second count, T, tar- 

40 get latency at the beginning of an audio communication 
and a value for the actual latency, L. The target latency 
Is arranged to decrement by one count every multiple of 
a predetermined number of received samples. Typically, 
the target latency may decrease by one count every one 

45 thousand received samples. The actual latency value Is 
arranged to slowly track the target latency value. Typi- 
cally the actual latency value is Incremented or decre- 
mented by 1 every one thousand received samples. 
When an underrun occurs the target latency is in- 

50 creased by a predetermined number of samples. The 
increase In target latency Is generally of the order of one 
to two hundred samples. The target latency then con- 
tinues slowly decrementing. Accordingly, the target la- 
tency varies as a sawtooth waveform. However, the 

55 mean target latency may vary considerably over a long 
period of time if the respective local clocks drift. The tar- 
get latency will also Increase as a consequence of data 
network traffic load thereby reducing the occurrence of 
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underruns by incorporating a longer latency. The target 
latency will of course drift down again when network traf- 
fic load is reduced. The actual latency arranged to follow 
the target latency using very small variations in value as 
above. Such an arrangement results in the desired s 
adaptive behaviour without introducing any audio clicks 
that may occur if L were varied using large changes. 
[0049] Further, in an audio communication compris- 
ing more than two participants, each receiving appara- 
tus must maintain for each transmitting apparatus logi- io 
cally connected thereto respective values of L and T as 
respective local clocks vary independently. 
[0050] A further embodiment can be realised as fol- 
lows. The transmitting apparatus transfers audio sam- 
ples from a microphone attached thereto to a speaker '5 
of a receiving apparatus. It is important to reducing the 
load on the net and the two host microprocessors 205. 
The speech output must be correctly timed even though 
the clocks at the apparatuses may drift, and samples 
may be delayed or lost by the network link. 
[0051] Referring to figure 7. there is shown schemat- 
ically the data flow of voice samples through an audio 
communication system using an embodiment of the 
present invention. The links 700 and 710 between the 
DSPs and the microprocessors are asynchronous. Sim- ^5 
ilarty. the link 705 across the network 110 is also asyn- 
chronous. Conventionally, double-buffering is used to 
communicate between a host microprocessor and a 
DSP card. i.e. one process works on bufferl while the 
other process works on buffer2. However, the embodi- 30 
ment described does not use this technique as it may 
lead to unsatisfactory results where the data is inter- 
spersed with periods of non-transmitted silence. Instead 
the two processes work on a single buffer in memory 
325. but control their interaction as follows; 3S 

1 ) DSP1 starts using the buff er by preparing an area 
of memory equal to its maximum blocksize plus a 
header which contains a length field and a sample 
count. The length field is set to zero while work is 40 
in progress on the buffer. 

2) DSPI collects speech samples, distinguishing be- 
tween speech and silence. It copies non-silent sam- 
ples into the appropriate section of the buffer. 

3) The DSP transfers voice samples across the net- 
work when either: 

a) The number of voice samples reaches max 50 
block size. 

b) There is a change from speech to silence. 

c) The physical buffer in memory becomes full. 

d) There is a period of silence which exceeds 
a predetermined duration. 

4) The DSPI indicates that a block is ready for 
transfer by: 



a) Filling in the sample count. This sample 
count is the local count of speech samples in- 
cluding silence, so that the receiving apparatus 
will be able to compute how much silence has 
been not been transmitted, 

b) Filling in a zero length to the following (un- 
started) block. 

c) Finally setting the length count. The non-zero 
value acts as a signal to free this section of the 
physical buffer, while the following zero has 
locked a new section of the buffer for DSPI to 

use. 

5) The normal action is for DSPI to interrupt host 
microprocessor at this point. Then the host will 
know to use the section of buffer just completed. 
The above technique has the advantage that it is 
alternatively possible for the host to periodically poll 
the buffer to see whether or not the length count 
contains a non-zero value. This may be more at- 
tractive in systems where a program running in an 
interrupt is in some way restricted. 

6) The microprocessor of the transmitting terminal 
transfers the variable data to the token ring card for 
transmission across the network. The microproces- 
sor signals to the DSPI that it has finished using 
the section of buffer by setting the length back to 
zero, or alternatively by negating the length to indi- 
cate when the block is free tor reuse. This is neces- 
sary because othenwise DSPI might continue 
processing the contents of memory and inadvert- 
ently wrap round in the buffer and reuses the mem- 
ory before the host has read the data. 

Note that in the interrupt-driven case the only 
load put on the microprocessor of the transmitting 
apparatus (and on the network) is when there is a 
block of non-silent data to transmit. 

However a very long period of silence might 
cause problems. For example the network link 
might time out, or the sample count field carried with 
the data might overflow and reset to zero. The 
DSPI sends a null block when the maximum silence 
limit is reached. This ensures that the DSP of the 
receiving apparatus will correct any overflow of the 
sample count. The overhead incurred is normally 
very small since the packet containing the sample 
count is correspondingly small. 

7) At the receiving apparatus the microprocessor 
will normally be interrupted by network hardware 
when a block is to be received. The microprocessor 
at the receiving apparatus receives the block, does 
some latency calculations (see below) and trans- 
fers the block down to the DSP2 buffer. It uses the 
same locking' technique, namely it takes ownership 
of a section of the buffer ensuring that the length of 
this bicck and the following (unstarted) block is set 
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to zero to signal unready data. Since data transfer 
is nornnally left-to-right it is possible to set the fol- 
lowing zero length in the same operation as the 
count and data transfer, but the true length of the 
real data block nnust be left at zero and transferred 
last. 

0000 cccc dd dd (data) dd dd dd 0000 <- start 
of next block. 

inth cccc dd dd (data) dd dd dd 0000 <- start of 
next block. 

Inth set as the final operation 

8) Conventionally host microprocessors 205 do not 
interrupt the DSP A conventional solution is for the 
DSP to Interrupt the host periodically to 'demand' 
data. This will put a non-trivial load on the host mi- 
croprocessor if the interrupts are reasonable fre- 
quent, even if there is not any data available. Ac- 
cordingly, the locking mechanism provides a pref- 
erable solution. When DSP2 has voice samples to 
output it does so regardless of the action of the host 
microprocessor. When output of the current block 
of voice samples is complete the DSP2 immediately 
polls the length field of the following block. If this is 
non-zero then output of that block commences. 

However if the length is zero output is suspend- 
ed until the zero value changes. The DSP2 again 
periodically polls the length field until it changes. 

9) When DSP2 finds a non-zero length field it has 
data to play, but if its own clock count is still less 
than the sample count associated with the data ei- 
ther silence is output or output is suspended until 
the local sample count matches the transmitted 
sample count. This process that ensures that arbi- 
trary sections of silence are given their correct du- 
ration. 

10) If its local sample count is already greater than 
the transmitted sample count, then an underrun has 
occurred, i.e. the transmitting apparatus is late. 
DSP2 must detect this although it can do nothing 
about it other than inform the host microprocessor 
of its occurrence (lost packets are similarly irretriev- 
able, though the sample count ensures that subse- 
quent packets are still output at the correct time, i. 
e. a lost packet appears as a suspension in trans- 
mission). 

11) The local clocks in the respective transmitting 
and receiving apparatuses may drift with respect to 
each other. If clock2 is faster than clockl then un- 
derruns occur. If clock2 is slower than clockl then it 
output of the samples becomes progressively later 
and later until the perceived delay becomes intoler- 
able to the human user. 

To overcome this the receiving host inserts a 



deliberate latency delay of L samples. Such a delay 
is the normal technique to allow the apparatus to 
absorb transient delays over the network. The re- 
ceiving apparatus adds L to each sample count field 

5 before passing the data to DSP2. L is arranged to 

vary by very small amounts. L will be incrementally 
increased if underruns occur and will be gradually 
decreased while there are no underruns. One major 
advantage of the process described here is that la- 

10 tency compensation does not need to be on any 
'block' basis but the granularity can be as fine as a 
single sample. A sample is simply discarded to re- 
duce latency, or the last sample is repeated to in- 
crease it. 

15 

[0052] To take advantage of this fine granularity L 
should not be changed violently. I nstead a second count 
T, the target latency, is introduced. T and Lare initialised 
when the session is established by interrogating the lo- 

20 cal DSP's count. Thereafter T is reduced continuously 
(but only slowly, e.g. by one or two samples per thou- 
sand) while no underruns occur. When an underrun is 
detected then T is increased by a relatively large delta, 
e.g. 100-200 samples, from where it drifts down again. 

25 So T varies like a sawtooth waveform although its mean 
value may drift up or down by a large amount over a 
long period, if the clocks drift. It will also drift up in re- 
sponse to transient network load, thereby reducing un- 
derruns by incorporating a longer latency, but will drift 

30 back down when the load is reduced. 

[0053] L hunts after T at all times but only by small 
deltas (1 or two samples per thousand). This ensures 
that L will give the desired adaptive behaviour, but with- 
out any of the audio clicks that may occur If a large (e. 

35 g. 20 milliseconds) unit of speech is lost, or a similar 
segment of silence inserted. 

[0054] Note that where communication is between 
many participants it is necessary for each receiving 
node to maintain independent latency counts, T and L, 

40 for each of the sending nodes with which it is connected, 
since all clocks may drift independently. 
[0055] Although the an embodiment uses adaptive 
buffering as described above other methods of adaptive 
buffering, such as that disclosed in "Adapter Audio Play- 

45 out Algorithm for Shared Packet Networks", by B Aldred, 
R Bowater, and S Woodman, IBM Technical Disclosure 
Bulletin, p255-257, Vol. 36, No.4, April 1 993, can equally 
well be used. 



so 



Claims 



1. A method for audio communication comprising re- 
ceiving a plurality of digital audio samples (410) in 
55 a memory (325) and transmitting said digital audio 
samples over a data network (110) when a prede- 
termined number thereof have been accumulated, 
said method further comprising the steps of 
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detecting that a current sample does not repre- 
sent voice activity, transmitting all samples 
stored in said memory over said data network 
irrespective ot whether or not there are said 
predetermined number thereof, and 
suspending transmission of said samples; 
detecting that a current sample represents a re- 
sumption of voice activity, 
resuming said transmitting of said samples, 
and 

transmitting an indication of the duration of said 
suspension. 

2. A method as claimed in claim 1 . further comprising 
the step of 

maintaining a count (405) of the number of 
samples, and 

wherein said indication represents said count. 

3. A method as claimed in any preceding claim, further 
comprising the step of 

periodically transmitting over said data network 
during said suspension a message in order to 
prevent said data network from timing-out. 

4. A method as claimed in claim 3, wherein said mes- 
sage comprises a timing indication representing the 
time ot occurrence of said transmission. 



5. 



6. 



7. 



70 



IS 



pensating comprises the step of nnodifying said in- 
dication. 

10. A system for audio communication comprising 
means for receiving a plurality of digital audio sam- 
ples (410) in a memory (325) and means for trans- 
mitting said digital audio samples over a data net- 
work (110) when a predetermined number thereof 
have been accumulated, said system further com- 
prising 

means for detecting that a current sample does 
not represent voice activity, 
means for transmitting all samples stored in 
said memory over said data network irrespec- 
tive of whether or not there are said predeter- 
mined number thereof, and 
means for suspending transmission of said 
samples; 

means for detecting that a current sample rep- 
resents a resumption of voice activity, 
means for resuming said transmitting of said 
samples, and means for transmitting an indica- 
tion of the duration of said suspension. 



25 



A method as claimed in claim 4, wherein said timing 
indication comprises sample count representing the 
number of samples taken. 

A method as claimed in either of claims 4 or 5. 
wherein said timing indication comprises a length 
count representing the number of samples transmit- 
ted. 

A method for audio communication comprising re- 
ceiving a number of digital audio samples over a 
data network (110) and producing an audio output 
therefrom, said method further comprising the steps 
of receiving further audio samples after a suspen- 
sion in transmission of said audio samples, 

receiving an indication of the duration of the 
suspension in transmission of said audio sam- 
ples, and 

resuming said production of said audio output 
according to said indication. 

A method as claimed in claim 7, wherein the step 
of producing comprises compensating for transient 
network load and clock drift. 



40 



45 



SO 



55 



Patentanspruche 

1. Vertahren zur Tonfrequenzubertragung, das den 
Emptang einer Vielzahl von digitalen Tonfrequenz- 
abtastwerten (410) in einem Speicher (325) unddie 
Ubertragung der digitalen Tonfrequenzabtastwerte 
Ober ein Datennetzwerk (110) umfaBt, wenn eine 
vorher f estgelegte Anzahl dieser Abtastwerte ange- 
sammelt worden ist, wobei das Verfahren des wei- 
teren die folgenden Schritte umfa3t: 

Feststellen, daB ein aktueller Abtastwert keine 
Sprachaktivitat darstellt, 

Ubertragen aller in dem Speicher abgelegten 
Abtastwerte uber das Datennetzwerk ungeach- 
tet dessen, ob die vorher festgetegte Anzahl 
von Abtastwerten vorhanden ist. und 

Unterbrechen der Obertragung der Abtastwer- 
te; 

Feststelten. daB ein aktueller Abtastwert eine 
Wiederaufnahme der Sprachaktivitat darstellt, 

Wiederaufnehmen der Ubertragung der Abt- 
astwerte und 

Ubertragen eines Hinweises auf die Dauer der 
Unterbrechung. 



9. A method as claimed in claim 8. wherein said com- 2. Verfahren nach Anspruch 1 , das des weiteren den 
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Schritt des Festhaltens eines Zahlstands (405) der 
Anzahl der Abtastwerte umfa3t, wobei der Hinweis 
den Zahlstand darstellt. 

3. Verfahren nach jedem der vorhergehenden Anspru- s 
Che, das des weiteren den folgenden Schritt um- 
fa3t: 

periodisches Ubertragen einer Nachricht uber 
das Datennetzwerk wahrend der Unterbre- 
Chung, unn zu verhindern, daB das Datennetz- 
werk eine Zeitbegrenzung auslost. 

4. Verfahren nach Anspruch 3, wobei die Nachricht ei- 
nen Zeithinweis unnfafBt, der den Zeitpunkt des Auf- ^5 
tretens der Ubertragung darstellt. 

5. Vertahren nach Anspruch 4, wobei der Zeithinweis 
einen Abtastwert-Zahlstand umfaGt, der die Anzahl 
der abgegriftenen Abtastwerte darstellt. 20 

6. Verfahren nach Anspruch 4 oder Anspruch 5, wobei 
der Zeithinweis einen Langenzahlstand umfa3t, der 
die Anzahl der ubertragenen Abtastwerte darstellt. 

25 

7. Verfahren zur Tonfrequenzubertragung, das den 
Empfang einer Anzahl von digitalen Tonfrequenz- 
abtastwerten uber ein Datennetzwerk (110) und die 
Erzeugung einer Tonfrequenzausgabe aus diesen 
Abtastwerten umfaBt, wobei das Verfahren des wei- 30 
teren die folgenden Schritte umfaBt: 

Empfangen weiterer Tonfrequenzabtastwerte 
nach einer Unterbrechung der Ubertragung der 
Tonfrequenzabtastwerte, 35 



System des weiteren folgendes umfaBt: 

Mittel zur Feststellung, daB ein aktueller Abt- 
astwert ketne Sprachaktivitat darstellt, 

Mittel zur Ubertragung aller in dem Speicher 
abgelegten Abtastwerte uber das Datennetz- 
werk ungeachtet dessen, ob die vorher festge- 
legte Anzahl von Abtastwerten vorhanden ist, 
und 

Mittel zur Unterbrechung der Ubertragung der 
Abtastwerte; 

Mittel zur Feststellung, daB ein aktueller Abt- 
astwert eine Wiederaufnahme der Sprachakti- 
vitat darstellt, 

Mittel zur Wiederaufnahme der Ubertragung 
der Abtastwerte und 

Mittel zur Ubertragung eines Hinweises auf die 
Dauer der Unterbrechung. 



Revendications 

1. Procede destine aux communications audio com- 
prenant la reception d'une pturalite d'echantillons 
audio numeriques (41 0) dans une m^moire (325) et 
la transmission desdits echantillons audio numeri- 
ques sur un reseau de donnees (110) lorsqu'un 
nombre predetermine de ceux-ci ont ete accumu- 
les, ledit proc6d6 comprenant en outre les stapes 
consistant a : 



Empfangen eines Hinweises auf die Dauer der 
Unterbrechung der Ubertragung der Tonfre- 
quenzabtastwerte und 

Wiederaufnahme der Erzeugung der Tonfre- 
quenzausgabe gemaB dem Hinweis. 



8. Verfahren nach Anspruch 7, wobei der Schritt der 
Erzeugung einen Ausgleich fur eine vorubergehen- 45 
de Netzwerkbelastung und Taktverschiebung unn- 
faBt. 

9. Verfahren nach Anspruch 8, wobei der Ausgleich 
den Schritt der Anderung des Hinweises umfaBt. so 

10. System zur Tonfrequenzubertragung, das Mittel 
zum Empfang einer Vielzahl von digitalen Tonfre- 
quenzabtastwerten (410) in einem Speicher (325) 
und Mittel zur Ubertragung der digitalen Tonfre- S5 
quenzabtastwerte uber ein Datennetzwerk (110) 
umfaBt, wenn eine vorher festgelegte Anzahl der 
Abtastwerte angesammelt worden ist, wobei das 



detecter qu'un echantillon en cours ne repre- 
sente pas une activite vocale, transmettre tous 
les echantillons stockes dans ladite memoire 
sur ledit reseau de donnees sans tenir compte 
de ce qu'il s*y trouve ou non ledit nombre pre- 
determine de ceux-ci, et 

suspendre la transmission desdits echan- 
tillons, 

detecter qu'un echantillon en cours represente 
une reprise de I'activite vocale, 

reprendre ladite transmission desdits echan- 
tillons, et 

transmettre une indication de la duree de ladite 
suspension. 

2. Procede selon la revendication 1, comprenant en 
outre I'etape de maintien d'un comptage (405) du 
nombre des Echantillons, et dans lequel ladite indi- 
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cation represente ledit comptage. 

3. Precede selon I'une quelconque des revendications 
precedenles, comprenant en outre I'etape consis- 
tent ^ : ^ 

transmettre periodiquement sur ledit reseau de 
donnees, pendant ladite suspension, un mes- 
sage afin d'empecher la d6synchronisation du- 
dit reseau de donnees. 

4. Proced6 selon la revendication 3, dans lequel ledit 
message comprend une indication de synchronisa- 
tion representant le moment d'apparition de ladite 
transmission. 

5. Precede selon la revendication 4, dans lequel ladite 
indication de synchronisation comprend un comp- 
tage d'echantillons representant le nombre 
d*6chantillons extraits. 

6. Procede selon I'une quelconque des revendications 
4 ou 5, dans lequel ladite indication de synchroni- 
sation comprend un comptage de longueur repre- 
sentant le nombre d'echantillons transmis. 



mine de ceux-ci ont ete accumules. ledit systeme 
comprenant en outre 

un moyen destine k d6tecter qu'un echantillon 
en cours ne repr6sente pas une activity vocale. 

un moyen destine a la transmission de tous les 
echantillons stockes dans ladite memoire sur 
ledit r6seau de donnees sans tenir compte de 
ce qu*il s*y trouve ou non ledit nombre prede- 
termine de ceux-ci, et 

un moyen destine ^ suspendre la transmission 
desdits echantillons, 

un moyen destine ^ d6tecter qu'un 6chantillon 
en cours represente une reprise de I'activite vo- 
cale, 

un moyen destine a reprendre ladite transmis- 
sion desdits echantillons, et 

un moyen destine 6 la transmission d'une indi- 
cation de la duree de ladite suspension. 



7. Proced6 destin6 aux communications audio com- 
prenant la reception d'un nombre d'echantillons 
audio numeriques sur un reseau de donnees (110) 
et la production d'une sortie audio ^ partir de ceux- 
ci, ledit proc6de comprenant en outre les etapes 
consistant k : 

recevoir d'autres Echantillons audio apres une 
suspension de la transmission desdits echan- 
tillons audio, 



35 



recevoir une indication de la duree de la sus- 
pension de la transmission desdits echantillons 
audio, et 



reprendre ladite production de ladite sortie 
audio conformement ^ ladite indication. 



8. Proc6d6 selon la revendication 7, dans lequel I'eta- 
pe de production comprend la compensation de 
charge transitoire du reseau et de derive d'horloge. 

9. Proc6d6 selon la revendication 8, dans lequel ladite 
compensation comprend Tetape consistant a modi- 
fier ladite indication. 



10. Systeme destine aux communications audio com- 
prenant un moyen pour la reception d'une pluralrte 
d'echantillons audio num6riques (410) dans une 
m6moire (325) et un moyen pour la transmission 
desdits echantillons audio numeriques sur un re- 
seau de donnees (110) lorsqu'un nombre pr6d6ter- 
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